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ABSTRACT 

We use the semi-analytic model ChemTreeN, coupled to cosmological A/-body simulations, to explore how 
different galaxy formation histories can affect observational properties of Milky Way-like galaxies' stellar 
haloes and their satellite populations. Gaussian processes are used to generate model emulators that allow 
one to statistically estimate a desired set of model outputs at any location of a p-dimensional input parameter 
space. This enables one to explore the full input parameter space orders of magnitude faster than could be done 
otherwise. Using mock observational data sets generated by ChemTreeN itself, we show that it is possible to 
successfully recover the input parameter vectors used to generate the mock observables if the merger history of 
the host halo is known. However, our results indicate that for a given observational data set the determination 
of "best fit" parameters is highly susceptible to the particular merger history of the host. Very different halo 
merger histories can reproduce the same observational dataset, if the "best fit" parameters are allowed to vary 
from history to history. Thus, attempts to characterize the formation history of the Milky Way using these kind 
of techniques must be performed statistically, analyzing large samples of high resolution A/-body simulations. 
Subject headings: galaxies: formation - Galaxy: formation - Galaxy: halo - methods: statistical - methods: 
analytical - methods: A/-body simulations 



1. INTRODUCTION 

Understanding the formation and evolution of galaxies is a 
central and long-standing problem in astrophysics. Over the 
past century, and particularly in the past decade, a tremendous 
amount of information has been gleaned about populations of 
galaxies and their temporal evolution, and data have been col- 
lected on galaxies spanning more than six orders of magnitude 
in stellar mass and over thirteen billion years in the age of the 
Universe. These observations show that the galaxies that we 
can see have undergone radical changes in si ze, appearance, 
and content over the last thirteen billion years (Rix et al. 2004[ 
IBrinchmann et al.|2004||Papoyich et al.[2005||shapley|201 1) . 
Complementary observations have provided a rich data-set on 
the kinematics and elemental abundances of stars in our own 
Milky Way, including large numbers of metal-poor stars in the 
halo of our own galaxy and in local dwarf galaxies. In prin- 
ciple, this 'galactic fossil record' can probe the entire merger 
and star formation history of the Milky Way and its satellites, 
and complement direct observations at higher redshifts. 



The quantity and quality of observational data on galaxy 
formation, which is already staggering, is going to increase 
expon entially over the ne xt decade. Sur veys such as LAM - 
OST ([Newberg et al |2009b SkyMapper ( |Keller et al.|2007| ), 



Gaia (Perryman et al.||2001|l, and, ultimately, the Large Syn 
optic Survey Telescope ( |LSST Science Coll aboration s et al.| 
2009 ) will produce petabytes of data on billions of individ- 
ual objects, both galactic and extra galactic, that will strongly 
inform our understanding of galaxy behavior. 

Despite this wealth of observational information, we cur- 
rently lack the detailed and self-consistent theoretical models 
necessary to adequately interpret such observational data sets. 
Purely analytic {i.e., "pencil-and-paper") theoretical models 
are insufficient to address the questions that are currently be- 
ing asked about galaxy formation, due in no small part to 
the range of physical components that must be simultane- 
ously modeled (e.g., gravity, dark matter, gas dynamics, ra- 
diative cooling, star formation and feedback), and the com- 
plex and non-linear coupling of these components. As a re- 
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suit of these complications, two separate theoretical methods 
are commonly used to study galaxy formation: multiphysics 
hydrodynamical simulations and semi-analytic models. 

Multi-physics numerical simulations are typically used to 
model galaxy formation by implementing relevant physical 
processes in as realistic a manner as is technically and com- 
putationally feasible. These calculations are typically based 
on A^-body dark matter dynamics simulations of cosmolog- 
ical structure formation, and include gas dynamics, the ra- 
diative cooling and heating of gas, subgrid models for star 
formation and feedback, and possibly more complex physics 
such as magneto-hydrodynamics, radiation transport, and the 
formation of, and feedback from, super massive blac k holes. 
Commonly-used codes of this type include Enzo (O'Shea 



eTaT][2004l iNorman et al ||2007> , Gadget (|Springel 2005a), 
Gasoline qWadsley rt^ptaK_RA^^ S lT^sag gD^ ), 
and more recently AREPO (Spring el|2010[ ). These codes pro- 
duce broadly similar results, although some important differ- 



ences remain to be resolv ed (O'She a et al.|2005| Agertz et al. 
2007] [Tasker et al.|[2008t |Sijacki et al.||201 1} . The main ad- 
vantage of such calculations is that they attempt to faithfully 
reproduce the relevant physical processes in as accurate of a 
manner as possible, and by virtue of their construction au- 
tomatically include any complex, non-linear interaction be- 
tween important physical processes. For example, this ap- 
proach is able to naturally handle the non-linear hydrodynam- 
ics of gas ejecta (once initialized) due to bursts of star for- 
mation, as does the return of such gas into later generations 
of structure formation. The main disadvantage of this sort 
of simulation lies in its cost: current-generation calculations 
of a single Milky Way-lik e galaxy performed at high (~ 100 
pc) spatial resolution (e.g. Agertz et al.|201 1| can easily con- 
sume hundreds of thousands of CPU hours and months of 
time to complete, making it challenging to model statistically- 
significant numbers of galaxies or to perform a meaningful 
study of variations in free parameters within the models. 

A second approach is often referred to as "semi-analytic" 
or "phenomenological" modeling of galaxy formation. This 
type of model typically is based upon either the extended 
Press-Schechter formalism or iV-body cosmological simula- 
tions, which provide the evolutionary histories for a popu- 
lation of galaxies. Prescriptions are then applied on top of 
these evolutionary histories to describe the behavior of the 
gas and stellar populations contained within, and surrounding, 
the dark matter halos that drive dynamics on large scales, as 
well as the observational properties of the resulting galaxies. 
These models are then calibrated by comparison to some set 
of observatio ns. Some examples of this sort of model include 
GALFORM (Benso n"et~al.|2003HBower et al.|2006l |Benson| 
|& Bower|2010|), Galact icus ( |Benson|2012| ), and ChemTreeN 
(Tumlinson 2006 2010). Two important strengths of this type 
of model are flexibility and speed: one can easily implement 
variations on a model (gas ejection from galaxies as a func- 
tion of halo mass and redshift versus a constant value) and 
then see within minutes how this affects the modeled popula- 
tion of galaxies. Alternately, one can rapidly sweep through 
large swaths of parameter space, finding the best-fit model 
to a given set of observational constraints. The disadvan- 
tages of this modeling technique include the extent to which 
the observable properties of simulated galaxies depend on the 
models of specific physical phenomena, such as the behav- 
ior of galaxies during mergers, as well as the large number of 
free parameters. Note that the latter also pertains to hydrody- 
namical simulations, although to a lesser extent. Even with 



these substantial downsides, however, semi-analytic models 
are useful for exploring the consequences of various physical 
phenomena on the observable properties of galaxies. 

One consequence of the large number of free parameters 
in semi-analytic models is that there is rarely a single "best 
fit" set of parameters to a given set of observations. Rather, 
given N free parameters that define an A^-dimensional space 
of potential values, it is likely that there is an M-dimensional 
surface, with M < N, of approximately equally good statisti- 
cal fit to the observations in question, or even possibly several 
discrete regions within the parameter space that provide com- 
parably good statistical fits to the chosen observations. When 
multiple observational data sets are used, or observations of 
different populations, it is possible that a given model param- 
eter may only be relevant for a subset of the observations (and, 
thus, the other observations provide no significant constraints 
upon this parameter). This challenge has only recently begun 
to be explored in t he context of galaxy for mation by |Hen-| 
riques et al.|d2009]), IBower et al.|d20T0l) andlLu et al.|(2011 
2012) 



It is also desirable to attempt to model the formation of the 
Milky Way as an exemplar of galaxy formation in general. 
The primary reason for doing so is that, by virtue of the Earth 
being embedded within the Milky Way, we p ossess a massive 
amount of data on this particular object (e.g. Dame et al. 2001 



Yanny et al 12003) |Carollo et al.|2007| |lvezic et al.|2008[ |Ju 



ric et al.|2008[|Ghez et al. 2008, Bond et al. 2010J and many 
others). One important drawback of modeling a single object, 
however, is that the most fundamental part of either type of 
galaxy formation model described above - the gravitationally- 
driven merger of successive generations of dark matter halos 
- is unique to a given halo, and may profoundly affect many 
observational quantities. As a result, there may be critical de- 
generacies between the merger history of a single galaxy and 
the model parameters. This is not an i ssue when studying 



large numbers of galaxies, as is done by Bower et al. (2010 
hereafter B10), since the varieties of merger histories are in 
eluded in the overall population. However, it presents sub- 
stantial complications when trying to duplicate observations 
of the Milky Way using either hydrodynamic or semi-analytic 
models. 

In this paper, we combine semi-analytic models of the 
formation of the Milky Way (including several different N- 
body simulation-based merger histories) with modern statis- 
tical techniques to explore how one might meaningfully con- 
strain the formation of the Milky Way's stellar halo and pop- 
ulation of satellite galaxies both from a theoretical standpoint 
and in terms of guiding future observations. This paper is 
arranged as follows: in Section [2] we describe our method- 
ology, including the cosmologicalsimulations, semi-analytic 
models, and in Section [3] the statistical tools. In Section |4j 
we examine how a small set of model parameters affects ob- 
servational quantities of our own Milky Way in a qualitative 
way, and do the same in a statistical sense in Section [5] We 
discuss the results and limitations of this work in Section [6] 

2. NUMERICAL METHODS 

In this Section, we briefly describe the A^-body simulations 
analyzed in this work and provide a brief summary of the main 
characteristic of the semi-analytical model, ChemTreeN. For 
a detail de scription of our numerical methods, we refer the 
reader to Tumlinson (2010, hereafter T10). 

2.1. N -body simulations 
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TABLE 1 

Main properties at z = of the four Dark Matter 
halos analyzed in this work. 



Name 




M 200 b 


c 


ZLMM 


MW1 


381 


1.63 


12.2 


2.1 


MW2 


378 


1.59 


9.2 


3.5 


MW3 


347 


1.23 


15.5 


2.0 


MW4 C 


366 


1.44 


13.6 


3.0 



11 Distances are listed in kpc 

Masses are listed in 10 12 M 
c MW4 corresponds to the simulation MW6 presented in T10 

Four different simulations of the formation of Milky Way- 
like dark matter (DM) haloes are analyzed in this work. The 
simulations were run using Gadget-2 ( Springel 2005b ) in a lo- 
cal computer cluster. Milky Way-like haloes were first iden- 
tified in a cosmological simulation with a particle resolu- 
tion of 128 3 within a periodic box of side 7.32 h~ l Mpc. A 
WMAP3 cosmology (Spergel et al. 2007) was adopted, with 
matter density fi m = 0.238, baryon density fi/, = 0.0416, vac- 
uum energy density J7a = 0.762, power spectrum normaliza- 
tion erg = 0.761, power spectrum slope n s = 0.958, and Hubble 
constant Ho = 73.2 km s -1 Mpc -1 . The candidates were se- 
lected to have gravitationally-bound dark matter halos with 
virial masses of M200 ~ 1.5 x 10 12 M Q at z = and no major 
mergers since z = 1.5 - 2. These Milky Way-like dark matter 
haloes were subsequently re-simulated at a resolution of 512 3 
by applying multi-mass particle 'zoom-in' technique. At this 
resolution, each DM particle has a mass of M p = 2.64 x 10 5 
M Q . Snapshots were generated at intervals of 20 Myr be- 
fore z = 4 and 75 Myr interval s form z = 4 to z = . A six di- 
mensional friends-of-friends (Diemand et al. 2006) algorithm 
was applied to identify dark matter halos in each snapshot. 
The gravitational softening length was 100 comoving pc in all 
simulations. The main properties of the resulting dark matter 
halos are listed in Table Q] 

2.2. Galactic chemical evolution model 

Dark matter only simulations are a useful tool for self- 
consistently characterizing the mass assembly history of dark 
matter halos in a CDM cosmology. Yet they don't provide any 
information about the stellar populations of a galaxy such as 
our own Milky Way. For this purpose, we have coupled to our 
Af-body simulations the semi-analytical model, ChemTreeN 
(T10). A semi-analytical model consists of a set of analytic 
prescriptions that describe the underlying physical mecha- 
nisms driving the evolution of the baryons. Processes such as 
star formation, stellar winds or chemical enrichment are in- 
troduced in the model through differential equations that are 
controlled via a set of adjustable input parameters. These pa- 
rameters are commonly set to simultaneously match differ- 
ent observable quantities, such as the galaxy luminosity func- 
tions or a variety of scaling relations. The general approach 
used in our models is to assume that each dark matter halo 
found in the simulations, and followed through the merger 
tree, possesses gas that has been accreted from the intergalac- 
tic medium (IGM), that this gas forms stars, that these stars 
return metals and energy to the host halo and to the larger en- 
vironment, and that future generations of stars form with the 
now metal -enriched gas. Table [2] summarizes the numerical 
values of the parameters used for our fiducial models. The set 
of parameters that are considered in the model emulator anal- 
ysis, as well as the range of values over which they are allow 



to vary, are also listed in this table. In what follows we briefly 
describe the implementation of the prescriptions that are most 
relevant to this work. 

2.2.1. Parcels and Particle Tagging 

ChemTreeN follows the evolution of stellar mass, metal 
abundance and gas budgets of galaxies as a function of cosmic 
time. For each endpoint or "root halo" in which the star for- 
mation and chemical enrichment history will be determined, 
the code identifies its highest-redshift progenitor and starts 
the calculation there. Note that, due to tidal disruption, the 
endpoint of some halos can be found at z > 0. The star for- 
mation history is calculated using 10 timesteps between each 
redshift snapshot, so timesteps are 2-8 Myr. At each timestep 
a star formation "parcel" is created with a single initial mass, 
metallically, and IMF. The metallicity for the parcel is derived 
from the present gas metallicity. Each parcel thus represents a 
single-age stellar population with a unique metallicity. When 
halos merge, their lists of parcels are concatenated. The calcu- 
lation concludes when the "root halo" or endpoint is reached. 
Note that this process is performed independently for each 
satellite galaxy or building block that gave rise to the final 
Milky Way-like halo. To explore the spatial, kinematic, and 
dynamical properties of stellar populations in the resulting ha- 
los it is necessary to assign "stars" to dark matter particles 
in the A^-body simulation. This is performed as follows. At 
each snapshot output a fraction of the most bound particles 
in each halo are selected. The star formation parcels that oc- 
curred between this snapshot and the previous snapshots are 
then identified, and an equal fraction of star-forming parcels 
are assigned to each of the selected particles. Note that only 
the 10% most gravitationally bound particles in each halo are 
considered, in order to approximate the effect of stars form- 
ing deep in the potential well of the galaxy, w here dense gas 
would be most likely to collect. As shown by Coope Fet al.| 
(2010), to reproduce the observed relationship between size 
and luminosity of dwarf galaxies in the Local Group, a frac- 
tion of 1 - 3% is required. This choice is not feasible here due 
to the limited resolution of our A^-simulations. However, gross 
properties of dwarf galaxies such as total luminosity are not 
strongly affected by our particular choice. Furthermore, rather 
than comparing with real observational data, in this work we 
focus on mock observational data sets generated by the model 
itself. 

2.2.2. Baryon Assignment 

The number of stars that a galaxy has formed throughout its 
history strongly depends on the amount of gas it contained. It 
is therefore important to define a prescription to model bary- 
onic accretion into dark matt er halos. Our models a dopt a 
prescription based on that of Bullock & Johnston (2005) that 
heuristially takes into account the influence of a photoionizing 
background from the aggregate star formation in all galaxies. 
This model assigns a fixed mass fraction of baryons, /b ary , to 
all DM halos before re-ionization, z r . After z r , gas accretion 
and therefore star formation in small halos are suppressed be- 
low v c = 30 km s" 1 . Between 30 km s" 1 and 50 km s" 1 , the 
assigned baryon fraction varies linearly from to /bary- This 
baryon assignment is intended to capture the IGM "filtering 
mass" (Gnedin 2000) below which halos are too small to re- 
tain baryons that have been heated to T > 10 4 K by global 
re-ionization. 

2.2.3. Star Formation Efficiency 
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TABLE 2 
Model Parameters. 




Parameter 


Fiducial Value 


Range 


Description 


Explored 


Zr 
/bary 
/esc 

e* 

< 

fla 

£SN 

,„Ia 
m Fe 


10 

0.05 
50 

1 X 1U 

0.07 
0.015 
0.0015 
0.5 


5-19 
0-0.2 
0-110 
0.2-1.8 
0.04 - 0.2 


Epoch of re-ionization 
Baryonic mass fraction 
Escape factor of metals 
Star formation efficiency (10~ 10 yr -1 ) 
SN II iron yield (M Q ) 

SN la probability 
SNe energy coupling 
SN la iron yield (Mq) 


Yes 
Yes 
Yes 

icS 

Yes 
No 
No 
No 



Stars are formed in discrete "parcels" with a constant effi- 
ciency, e*, such that the mass formed into stars = e*M gas Af 
in time interval Af . The star formation efficiency is equivalent 
to a timescale, e* = 1/?*, on which baryons are converted into 
stars. The fiducial choice for this parameter is t t = 10 Gyr, or 

e„ = 10- 10 yr _1 - 

2.2.4. Stellar Initial Mass Function 

An invariant stellar initial mass function (IMF) at all times 
and at al l metallicities is assumed. The invariant IMF adopted 
is that of |Kroupa|(|2001| l, dn/dM oc (w/M ) a , with slope a = 
-2.3 from 0.5 - lWM Q and slope a = -1.3 from 0.1 - 0.5 
Mq . The impact that variations of the IMF may have on the 
stellar populations of the resulting stellar halos will be studied 
in a follow-up work. 

2.2.5. Type la SNe 

Type la SNe are assumed to arise from thermonuclear ex- 
plosions triggered by the collapse of a C/O white dwarf pre- 
cursor that has slowly accreted mass from a binary compan- 
ion until it exceeds the 1 .4 M Q Chandrasekhar limit. For stars 
that evolve into white dwarfs as binaries, the SN occurs after a 
time delay from formation that is roughly equal to the lifetime 
of the least massive companion. In our models, stars with ini- 
tial mass M = 1.5-8 M are considered eligible to eventually 
yield a Type la SN. When stars in this mass range are formed, 
some fraction of them, //„, are assigned status as a Type la 
and given a binary companion with mass obta ined from a suit- 
able probability distribution (Greggio & Renzini|1983[ l. The 
chemical evolution results are sensitive to the SN la probabil- 
ity normalization, //„. This parameter is fixed by normalizing 
to the observed relative rates of Type II and Type la SNe fo r 
spiral galaxies in the local universe (Tamm ann et al.| |l994). 
This normalization gives a ratio of SN II to la of 6 to 1 . 

2.2.6. Chemical Yields 

ChemTreeN tracks the time evolution of galaxies' bulk 
metallicities by considering Fe as the proxy reference ele- 
ment. For Type la S Ne with 1.5- 8 Mq the models adopt the 
W7 yields of |Nomoto eFaTI ( |T997l > for Fe, with 0.5 M of Fe 
from each Type la SN . Type TTSNe are assumed to ari se from 
stars o f 10 to 40 M Q , with mass yields provided by Tominaga 
(2009). They represent the bulk yields of core-collapse SNe 



with uniform explosion energy E = 10 erg. These models 
have M = 0.07 - 0.15 Mq Fe per event. 

2.2 J. Chemical and Kinematic Feedback 

One possible cause of the observed luminosity-metallicity 
(L-Z) relation for Local Group d warf galaxies is SN- driven 
mass loss from small DM halos (Dekel & Woo 2003}. To 



model this physical mechanism, ChemTreeN tracks mass loss 
due to SN-driven winds in terms of the number of SNe per 
timestep in a way that takes into account the intrinsic time 
variability in the star formation rate and rate of SNe from a 
stochastically sampled IMF. At each timestep, a mass of gas 



Miost = £sn 



E -^SN^SN 
. 2v? ; „ 



(1) 



becomes unbound and is removed permanently from the gas 
reservoir. Here Voire is the maximum circular velocity of the 
halo, TYsn is the number of SNe occurring in a given timestep 
and £sn is the energy released by those SNe. The only free 
parameter, 6sn, expresses the fraction of the SN energy that is 
converted to kinetic energy retained by the wind as it escapes. 
The sum over index i sums over all massive stars formed in 
past timesteps that are just undergoing an explosion in the cur- 
rent timestep. Note that this approach allows for variations in 
the number and energy of SNe from timestep to timestep. 

The selective loss of metals that should arise when SNe 
drive their own ejecta out of the host galaxy is captured by 
the parameter / esc , which expresses the increased metallicity 
of the ejected winds with respect to the ambient interstellar 
medium. At each timestep, a total mass in iron Mf* st is re- 
moved from the gas reservoir of the halo: 



Mf;^ = / esc M l0 



M 



ISM 



(2) 



where Mf£ M is the total mass of iron in the ambient inter- 
stellar medium, M„ as x 10 [Fe / H] . This prescription ensures 
that, on average, the ejected winds are / esc times more metal- 
enriched than the ambient interstellar medium. Alternatively, 
the fraction of metal mass lost from the halo is / esc times 
higher than the total fraction of gas mass lost. 

2.2.8. Isochrones and Synthetic Stellar Populations 

To compare these model halos to observational data on the 
real MW and its dwarf satellites, it is necessary to calculate 
the luminosities and colors of model stellar populations using 
pre-calculated isochrones and population synthesis models. 
Each star formation parcel possesses a metallicity, age and 
a total initial mass distributed according to the assumed IMF. 
These three quantities together uniquely specify an isochrone 
an d how it is populate d. The models adopt the isochrones 
of IGirardi et aT] ( |2Q02| [20041 ) for the UBVRIJHK and SDSS 
ugriz systems, respectively, as published on the Padova group 
website 1 . The lowest available metallicity in these isochrones 
is [Fe/H] = -2.3. Thus, this value is used to represent stellar 
populations with lower metallicities. 
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3. STATISTICAL METHODS 

As discussed in the introduction, semi-analytic models such 
as ChemTreeN need to be calibrated by comparing their out- 
puts to an observational data set. It is important to explore 
the input parameter space as thoroughly as possible to iden- 
tify regions where good fits to the data can be achieved. The 
simplest method of exploring the dependence of interesting 
model outputs on these input parameters would be to run the 
model over a grid of points, densely sampling the parame- 
ter space. Although ChemTreeN can generate a new model 
within a few minutes, this technique quickly becomes pro- 
hibitively expensive as the dimensionally of the input param- 
eter space increases. As an alternative to this approa ch we 
can construct a Gau ssian Proce ss emulator (|O , Hagan||2006[ 
IQakley & O , Hagan|2002| [2004] [Kennedy & O'Ha gan 2000), 
which acts as a statistical model of our computer model, 
ChemTreeN. An emulator is constructed by conditioning (i.e., 
constraining fits as described below) a prior Gaussian Process 
on a finite set of observations of model output, taken at points 
dispersed throughout the parameter space. Once the emula- 
tor is trained it can rapidly give predictions for both model 
outputs and an attendant measure of uncertainty about these 
outputs at any point in the parameter space. The probabil- 
ity distribution for the model output at all points in parame- 
ter space is a very useful feature of Gaussian Process emu- 
lators - simpler interpolation schemes, such as interpolating 
polynomials, produce an estimate of the model output at a 
given location in the parameter space with no indication as 
to how much this value should be trusted. Furthermore, nu- 
merical implementations of Gaussian Process emulators are 
very computationally efficient (producing output in microsec- 
onds rather than minutes), making it feasible to predict vast 
numbers of model outputs in a short period of time. This abil- 
ity opens many new doors for the analysis of computer codes 
which would otherwise require unac ceptable amounts of time 
( |Higdon et al.|2008[|Bay"arri et al.|2002| B10). 



3.1. Gaussian Process model emulator 

We construct an emulator for a model by conditioning 
a Ga ussian Process prior (se e Figure [T| on the training 
data ([Chiles & Delnner|[T939l |Cressie|[r993) |Rasmussen"~&] 
Williams 2005)>. A Gaussian Process is a stochastic process 
with the property that any finite set of samples drawn at dif- 
ferent points of its domain will have a multivariate-normal 
(MVN) distribution. Samples drawn from a stochastic pro- 
cess will be functions indexed by a continuous variable (such 
as a position, time or, in our case, a parameter of the model) 
as opposed to a collection of values as generated by, e.g., a 
normally-distributed random variable. A Gaussian Process 
is completely specified in terms of a mean and covariance, 
both of which can be functions of the indexing variable. The 
unconditioned draws (solid lines) shown in the left panel of 
Figure [T]are smooth functions over the domain space labeled 
x. If enough samples are drawn from the process the aver- 
age of the resulting curves at each point would converge to 
zero. A posterior distribution function can be obtain by con- 
ditioning this process on the training points obtained from the 
model. This forces samples drawn from the process to al- 
ways pass through the training points, as shown in the right 
hand panel of Figure [T] Repeated draws from the conditioned 
posterior distribution would on average follow the underlying 
curve with some variation, shown by the gray confidence re- 
gions. These confidence bubbles grow away from the training 



points, where the interpolation is least certain, and contract 
to zero at the training points where the interpolation is abso- 
lutely certain. The posterior distribution can be evaluated to 
give a mean and variance at any point in the parameter space. 
We may interpret the mean of the emulator as the predicted 
value at a point, the variance at this point gives an indication 
of how close the mean is the true value of the model. Again 
we emphasize that simpler interpolation methods, such as in- 
terpolating polynomials or splines, generally do not provide 
any measure of the accuracy of the method at a given point in 
parameter space. 

To construct an emulator we need to fully specify our Gaus- 
sian Process (GP) by choosing a prior mean and a form for the 
covariance function. The model parameter space is taken to 
be p-dimensional. We model the prior mean by linear regres- 
sion with some basis of functions h(x), we use h(x) = {1}. We 
specify a power exponential form for the covariance function 
with power a ~ 2 which ensures smoothness of the GP draws, 



c(x h Xj) = 9 exp 




+ 6,jO N . (3) 



Here 6*o is the overall variance, the 9 k set characteristic length 
scales in each dimension in the parameter space and 9n is a 
small term, usually called a nugget, added to ensure numeri- 
cal convergence or to model some measurement error in the 
code output. The shape of the covariance function sets how 
the correlations between pairs of outputs vary as the distance 
between them in the parameter space increases. The scales in 
the covariance function 8 k are estimated from the data using 
maximum likelihood methods (Rasmussen & Williams 2005 ), 
in Figure |2]we demonstrate their influence on an artificial data 
set. The linear regression model handles large scale trends of 
the model under study, and the Gaussian Process covariance 
structure captures the residual variations. 

Given a set of n design points T> = {x\,...,x n } in a p- 
dimensional parameter space, and the corresponding set of n 
training values representing the model output at the design lo- 
cations Y = {y\ , . . . ,y n } , the posterior distribution defining 
our emulator is 

V(x,&)~GP (m(x,d),£(x, 
for conditional mean m and covariance S. 

mix) = h(x) T (3 + k T (x)C- l (Y-HP), 

Y*(Xj,x j) = c(xi, xj) + k T (Xj)C~ l k(x j)+T(Xi,Xj), 

Cij = c(x h Xj) (4) 

T(x i: Xj) = (h(x i f-k T (x i )Cr 1 Il) T (H r C _I H) _1 
(h(xj) T -k T (xj)C- l n) , 



k(x) T = (c(x u x),...,c(x n ,x)). 



(5) 



Where m(x) is the posterior mean at x, Y,{Xj,Xj) is the pos- 
terior covariance between points Xj and x,-, C is the n x n 

covariance matrix of the design T>, j3 are the maximum- 
likelihood estimated regression coefficients, h the basis of re- 
gression functions and H the matrix of these functions evalu- 
ated at the training points. 

The elements of the vector k(x) are the covariance of an 
output at x and each element of the training set. It is through 
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FIG. 1. — Left panel: unconditioned draws from a Gaussian Process 
GP(0, 1) with a mean of zero and constant unit variance. In our work, the 
indexing variable .v represents an input parameter of the model, whereas the 
variable y a desired observable. Right panel: draws from the same process 
after conditioning on 7 training points (black circles). The gray band in both 
panels is a pointwise 95% confidence interval. Note how the uncertainty in 
the right panel grows when away from the training points. 

this vector k(x) that the emulator "feels out" how correlated 
an output at x is with the training set and thus how similar the 
emulated mean should be to the training values at those points. 
Note that the quantities defined in Q depend implicitly upon 
the choice of correlation length scales 9 = {9o,9 k ,9n} which 
determine the shape of the covariance function. 

The expression for the mean in Q can be decomposed 
into a contribution from the prior, the linear regression model 
h(x) T (3 plus a correction applied to the residuals determined 
by the covariance structure k T (x)C~ l (Y - Hf3). Similarly the 
covariance can be decomposed into a contribution from the 
prior, the covariance function c(Xj,Xj) plus corrections aris- 
ing from the prior covariance structure and the covariance of 
the new location x through k(x). These terms weight the 
points Xi,Xj more highly the closer they are to the training 
points through k. The T term gives the corrections to the co- 
variance arising from the regression model. 

A Latin Hyper Cube (LHC) design is used to generate the 
training locations in the parameter space. This is an efficient 
design for space-fil l ing in high dimensional parameter spaces. 
( [Sacks et al.| [1989; Sant ner et aL"P 0O3). LHC sampling di- 
vides the range of each dimension of the /^-dimensional input 
parameter space into N equal length intervals. N points are 
then randomly distributed in the resulting grid, satisfying the 
condition that each row and column are occupied by only one 
point for each 2D projection of the design. 

3.2. Model to data comparison 

To compare model output (via the emulator) to experimen- 
tal data, it is convenient to introduce the notion of implausi- 
bility. Following B10, we define an implausibility measure 
as follows. Consider a model with a single output for which 
we have generated an emulator with posterior mean m(x) and 
variance Y,(x,x). The implausibility I(x) of the emulated 
model output at a point x in the parameter space is determined 
by 

(m(x)-E[Y f ]) 2 



I\x) = 



E(x,x) + V[Y f ] 



(6) 



where Yf represents the distribution of field data that we seek 
to compare our model against, E[Yt] the expected value of 
Yf and V[Yf] the observational uncertainties. Large values of 
I(xt) indicate that the input parameter vector Xt is unlikely 
to give a good fit to the observable data. Here we have only 
accounted for the variation from the emulator itself and the 



field data. In the following work we carry out comparisons of 
the model output with idealized field data generated from the 
model itself. As we will show, this can be a very useful ex- 
ercise in understanding a model and its dependence upon the 
underlying parameters. It is common practice (introduced by 
Kennedy & O'Hagan 2001 ) to introduce a bias or discrepancy 
term representing the deviation of the model predictions from 
observed reality; this is beyond the scope of the present anal- 
ysis, in which we compare the model output at different loca- 
tions in the parameter space to certain selected default values. 

The output of ChemTreeN is multivariate — the code pro- 
duces predictions for many observables, such as the distri- 
butions of stellar populations in stellar halos of Milky Way- 
like galaxies or its satellite galaxy luminosity function and 
metallicity-luminosity relation. It is possible to compare sep- 
arately each observable with a model emulator generated from 
the corresponding model output. However, considerably more 
powerful conclusions can be drawn by examining the joint 
properties of the observables and model outputs. Consider 
a t -dimensional vector of model outputs y(x) = {y\ , . . . ,y t } 
with a corresponding vector of field data Yf. We extend our 
training set to be the t x n matrix Y = {y{x^) : y(x n )}. 
We apply a Principal Component Analysis (PCA, Jollife 



2002) decomposition to our training data set Y to obtain 
a reduced set of r -C t approximately independent and nu- 
merically orthogonal basis vectors nearly spanning the t- 
dimensional output space, discarding terms in the eigen- 
decomposition which contribute less than 5% of the total vari- 
ation. We construct individual independent emulators from 
the training values projected onto each basis. When we wish 
to evaluate the model output at a new location we invert this 
transformation to obtain predictive distributions for each of 
the t model outputs at a given location in the parameter space. 
This procedure is detailed in Appendix [A] 

The implausibility |6} has a natural extension to multivari- 
ate outputs (B10, §3.7). From the emulator we obtain a t- 
dimensional vector of predictions for the model output with 
means rh(x). The emulated t xt dimensional covariance ma- 
trix K(x) between the model outputs at the point x in the de- 
sign space 2 can also be constructed from the PCA decompo- 
sition. With these two quantities, we define the joint implau- 
sibility J(x) for observables Yf with measurement variance 
V[Yf ] and mean values E[Yf]: 

J 2 (x)=(E[Y f ]-m(x)) T 

(K(x) + I-V[Yf])~ 1 (E[Y f ]-rh(x)) , (7) 

This covariance-weighted combination of the multiple ob- 
servables gives a reasonable indication of which input values 
x are predicted by the emulator to lead to model predictions 
close to the observed values Yf. The squared implausibil- 
ity score J 2 (x) can be thought of as approximately the sum 
of r squared independent standard normal random variables, 
hence with approximately a distribution, leading in the 
usual way to confidence intervals for J(x) that induce con- 
fidence sets in the input space. In this work we consider 75% 
confidence sets for x. In the univariate case, for example, lo- 
cations x in the parameter space with J(x) < 3 are viewed as 
"plausible", or likely to give model outputs closely reproduc- 
ing the observational data, given the experimental and inter- 
polation uncertainties. 



Here K,y(£c) = Cav{yi(x),yj(.x)]. See equation A7 for more details 
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FIG. 2. — Demonstration of emulator behavior as a function of correlation 
length, B\. In all panels, the solid blue line shows the mean of the emulator 
and the solid gray region is a 95% confidence interval around this region. Left 
panel: fitting with a value of 9\ that is too small (under-smoothing). Right 
panel: shows over-smoothing by using a value of 0\ that is too large. Central 
panel: smoothing with a value of 6\ = 0.68 that was obtained by a maximum 
likelihood estimation method. 

The model parameters x and output Y were scaled and cen- 
tered prior to emulator analysis, which improves numerical 
convergence in the matrix inversions and maximum likelihood 
estimation process. 

4. EFFECTS OF GALAXY FORMATION HISTORY ON BULK 
STELLAR PROPERTIES I: GENERALITIES 

In this Section we examine how observable quantities of our 
final Galactic models, such as properties of their dwarf galaxy 
population, depends on the galaxy's merger history as well as 
on the model parameters that control bulk properties of their 
star formation histories. We start by qualitatively comparing 
the output of models obtained by varying both the merger his- 
tories and the model parameters, and then move on to a more 
rigorous statistical analysis in the following Section. 

In the left panels of Figure |3]we explore the effects that dif- 
ferent parameter values have on Milky Way-like models ob- 
tained after coupling ChemTreeN with the merger tree based 
on simulation MW1. The top panel shows how the Luminos- 
ity Function (LF) of the surviving satellites at z = depends on 
the redshift of the epoch of re-ionization, z r , the baryon frac- 
tion, /bary, and the fraction of metal lost through SNe winds, 
/ esc . As previously discussed by T10, for a given f^. dlJ = 0.05, 
the number of bright satellite galaxies (i.e., M w < -10) does 
not strongly depends on z r . However, we do observe a strong 
dependence of the number of satellites on z r at the faint end 
of the LF. Note that the number of faint dwarfs increases by 
a factor of ~ 2 - 3 when shifting z r from 10 to 6.5. This is 
not surprising, since by moving the epoch of re-ionization 
to lower redshifts we allow star formation to take place in 
small halos (i.e., v c < 30 km s" 1 ) for extended periods of 
time. On the other hand, the number of bright satellites does 
depend on the parameter controlling the amount of available 
gas to form stars, /bary- This parameter basically acts as a 
re-normalization of the satellite galaxy luminosity function, 
shifting it up or down for higher or lower values of /bary, re- 
spectively. In our models, the escape factor of metals does 
not have a significant impact on the luminosity function. This 
is because we have assumed a fixed star formation efficiency, 
and thus complicated physical processes that affect a galaxy's 
star formation histor y, such a s metal cooling, are neglected 
(e.g. [Choi & Nagamine|2009l >. 

It is very interesting to compare these LFs with those ob- 
tained from galaxies that have had different merging histo- 
ries, after fixing the values of all model parameters. Figure 
[4] shows the fraction of mass within /?20o at any given red- 



shift of our four Milky Way-like dark matter halos. Notice 
the range of merger histories spanned by these halos. The lu- 
minosity functions obtained after coupling each halo merger 
tree to ChemTreeN are shown in the top right panel of Figure 
[3] The values chosen for the input parameters in this anal- 
ysis are z r = 6.5, /bary = 0.05 and / esc = 80. The number of 
bright dwarf galaxies is very sensitive to the merger history of 
the parent halo, though there is a significant halo-to-halo vari- 
ation (approximately factor of two) in the overall number of 
dwarf galaxies between the four simulations. It is important to 
notice that the scatter observed in these luminosity functions 
is similar to that previously observed when varying model pa- 
rameters. This suggests a degeneracy between the effects that 
different parameters and merger histories have on the observ- 
able properties of our Milky Way-like galaxies. 

The situation is different for the Luminosity-Metallicity (L- 
Z) relation of the surviving satellite galaxies. We derived each 
satellite's metallicity by obtaining a model metallicity distri- 
bution function (MDF) from the galaxy's particle-tagged stel- 
lar populations and taking its mean value. The right middle 
panel of Figure |3]shows that this relation is not sensitive to the 
merger history of a galaxy. For comparison, we show as well 
the linear fits derived from each data set. This result highlights 
the importance of scaling relations for this kind of analysis, as 
they allow us to put constraints on our models independently 
of the particular galaxy's merger history. In the left middle 
panel we explore dependencies of this relation with model 
parameters. The parameter with the strongest impact is the 
escape factor of metals. Note that by decreasing the value of 
/escp from 80 to 50 we modify not only the slope of the L-Z 
relation but also its intercept, increasing the mean metallic- 
ity of all satellites. Instead, by varying the baryon fraction 
we re-normalize the L-Z relation, keeping its slope constant. 
Although the mean metallically of a given satellite does not 
depends on the value chosen for /bary, its absolute magnitude 
M y does. For a smaller (larger) value of /b ary the whole distri- 
bution of satellites is shifted towards fainter (brighter) abso- 
lute magnitudes, thus effectively decreasing (increasing) the 
average satellite's metallicity a given atM v . We finally notice 
that variations of z r mainly induce changes in the slope of this 
relation. A later epoch of re-ionization allows star formation 
to take place on the smallest galaxies for longer periods of 
time. As a consequence, faint satellites that were fed by the 
smallest building blocks have a larger absolute magnitude at 
z = 0. Note, however, that this mechanism does not strongly 
affect the brightest satellites (M v < -12), since accretion of 
these smallest building blocks does not significantly modify 
their overall luminosity. This is because they are big enough 
to accrete gas and keep forming stars after re-ionization. 

It is also interesting to compare the stellar halos' metallic- 
ity profiles as function of galactocentric radius obtained from 
both simulations with different merger histories and model 
parameters. Notice that after the last major merger event, 
most of the star formation in the host galaxy is expected to 
take place in a dissipationally collapsed, baryon-dominated 
disc. The formation of this galactic component cannot be 
properly accounted for in our dark matter-only A^-body sim- 
ulation. Therefore, for this analysis, we have decided to ne- 
glect all star formation episodes that have taken place in the 
host halo after the redshift of the last major merger (LMM). 
The right bottom panel of Figure [3]shows that, independently 
of the merger history, our four halos present a noticeably flat 
spherically-averaged metallicity profile up to ~ 150 kpc, with 
mean metallicities between -1.3 dex and -1.6. dex. This is in 
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FIG. 3. — Behavior of mock observables as a function of model parameters and dark matter halo merger history. Left column: Dependence of different 
mock observables on the input parameters of the model. From top to bottom panel, we show the satellite galaxy luminosity function, satellite luminosity- 
metallicity relation, and the mass-averaged metallicity profile of the primary galaxy's stellar halo for five different models. The models were obtained by 
coupling ChemTreeN with the W-body simulation MWI. The legend on each panel indicates the values of the parameters associated with each model. Right 
column: Dependence of different mock observables on galaxy merger history. From top to bottom panel, we show the satellite galaxy luminosity function, 
satellite luminosity-metallicity relation, and the mass-averaged metallicity profile of the primary galax's stellar halo for four different models. These models were 
obtained by coupling ChemTreeN with the four available iV-body simulations. To generate these models we fixed the values of all model parameters. The values 
chosen for this example are (z r . /bary, /esc) = (6.5, 0.05, 80). 



agreement with previous studies that have found flat metallic- 
ity profiles on halos formed purely thro ugh accretion of large 
number of s atellites (see e.g. |De Lucia & HeTm i 2008 ; Cooper] 
|et al.|[20T0| Monachesi et al., in prep.). Further more, recent 
observations on the halos of M31 (e.g. Richardson et al. 2009| 
Kalira Fet al.||2006), M81 (Mon achesi et al., in prep.) and 
tie Milky Way ( |Ma et al.||2012| in prep.) have shown indi- 
cations of flat or relatively mild metallicity profiles outside 
~ 20-30 kpc; radius at which accretion from satellites is ex- 
pected to become the dominant co ntributio n as opposed to 
in-situ star formation ( |Zolotov et a l. 2009 2 010| |Font et aT] 



|201 la) . Note that our results a re not in contradiction w ith the 
dual stellar halo proposed by Carollo et al. ( 2007 2010) since, 
as previously discussed, in situ star formation is n ot prop- 
erly accoun ted for in these simulations. In addition, [Cooper 



et al. (2010} showed that stronger metallicity gradients can 
be found in halos with a small number of relatively massive 
progenitors. 

On the left bottom panel of Figure [3] we show that this re- 
sult is robust to variations of the model's parameters. The 
parameter that most strongly affect this profile is / esc since, 
as discussed above, this controls the mean metallicity of the 
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FIG. 4. — Galaxy formation history as shown by the virial mass of the most 
massive progenitor of our Milky Way-like dark matter halos as a function of 
the expansion factor. In all cases, the mass is normalized to the z = mass of 
the galaxy. Notice the range of merger histories spanned by these halos. 

host halo's building blocks. Nevertheless, variations of this 
parameter result on final stellar halos with approximately fiat 
metallicity profiles. Note that the scatter induced by varying 
the model parameters is similar to that obtained by sampling 
different merger histories. 

5. EFFECTS OF GALAXY FORMATION HISTORY ON BULK 
STELLAR PROPERTIES II: STATISTICAL ANALYSIS 

In the previous Section we have shown qualitatively that 
certain observable quantities, commonly used to constrain the 
input parameter space of our semi-analytic model, depend 
strongly on the particular merger history of a galaxy. Fur- 
thermore, as shown previously by BIO in a different context, 
simultaneous variations of certain groups of parameters can 
yield similar observables, revealing important connections 
among the underlying physical mechanisms. The present dis- 
cussion highlights the need for robust statistical tools capable 
of exploring the input parameter space densely, as well as the 
need for quantitative characterization of the effects that dif- 
ferent merger histories have on the present-day distribution 
of observable quantities. In what follows we will show how 
model emulators are well-suited to address this kind of prob- 
lem. 

5.1. Parameter space exploration 

As discussed in Section [3] the first step in constructing a 
model emulator is to train a suite of Gaussian process priors 
using a finite set of model outputs. These outputs are ob- 
tained by running ChemTreeN using different sparsely sam- 
pled sets of input parameters drawn from an experimental de- 
sign T> = {x\, . . . ,x„}. For simplicity, we will first take x t to 
be a three-component vector, a;,- = (z' r , f£ sc , f{, mj )- Within this 
framework it is trivial to increase the dimensionality of x i7 
but interpreting and visualizing the final implausibility sur- 
faces become progressively more challenging. (For examples 
of this kind of analysis with larger dimensionality see Ap- 
pendix |B]or BIO.) 

The training set consists of a number n = 200 of design 
points T> = {x\ , . . . , x n } C K 3 , drawn using LHC sampling. 
This gives an acceptable balance between adequate coverage 
of the input space and acceptable run time. The input param- 
eters are allowed to vary within the ranges specified in Table 
12 The full model ChemTreeN is then run at each of these 
200 training points. The next step is to select a set of outputs, 



Y = {yi , . . . ,y r }, for which individual emulators will be con- 
structed. Note that the output selection is determined purely 
by the set of t observables or field data, Yf = {y/j , . . . , y/j}, 
chosen. Motivated by our discussion in Section 14] we chose 
to emulate five values of the satellite galaxy luminosity func- 
tion, each at a different absolute magnitude, in addition to 
the slope and the intercept of the satellite galaxy luminosity- 
metallicity (L-Z) relation. This gives us a total of t = 7 outputs 
to be extracted from the models. As we show below, each of 
these outputs most strongly constrains different model param- 
eters. In Figure [5]we show the luminosity functions and L-Z 
relations extracted from the training set models, obtained after 
running ChemTreeN on the design points T>. For clarity, only 
the result of a linear fit to each L-Z relation is shown. Note the 
great diversity of outputs that can be obtained by varying the 
input parameter of the model for a fixed merger history. The 
black dashed lines on the left panel indicate the five values of 
M v chosen to sample the LFs. 

After training seven model emulators (one for each observ- 
able), we compare the model (via the emulators) to the ob- 
servable data by calculating individual surfaces of implausi- 
bility, l evel sets of I(x) for each observable. As described in 
Section [X2"l these three-dimensional surfaces bound the set of 
input parameter vectors x that are more likely to reproduce 
the observed data set Yf, In principal, the observable data 
should be obtained from the luminosity function and L-Z re- 
lation of the Milky Way satellite galaxies. However, to test 
the constraining power of this approach when the "truth" is 
known, a particular run of the ChemTreeN model will be used 
as a mock observable data set. This type of controlled exper- 
iment can be very helpful in model performance assessment, 
since we know exactly what values of the input parameters 
were used to obtain the artificial "field data;" it also obviates 
the need for modeling discrepancy. The black solid lines in 
Figure [5]show the luminosity function and L-Z relation of the 
model used as the mock observations. The values of the input 
parameters used are x b s = (z r , / esc , /bary) = (10, 50, 0.05). As 
shown by T10, this set of parameters provides a reasonable fit 
to the actual field data. It is important to note that this input 
parameter vector is not included among our design points T>. 

In Figure [6] we show three different sections of each of the 
implausibility surfaces obtained from the five model emula- 
tors constructed for the LF's outputs. The 3-dimensional im- 
plausibility surfaces are sliced with three orthogonal planes 
as defined by the components of a; b s . The top row panels 
show the / esc = 50 section of the I(x) surfaces. The black 
dashed lines indicate the values of the remaining two com- 
ponents of a; bs. Given an input parameter vector x t , the 
larger the value of the I(xt), the less likely is that a good fit 
to the observable data can be obtained. From the left-most 
panel (i.e., M v = -16.5) it becomes clear that the parameter 
controlling the amount of available gas to form stars, /bary, is 
strongly constrained by the number of satellite galaxies at the 
bright end of the satellite galaxy luminosity function. Further- 
more, within the range of values considered here, the number 
of satellites at this M v is independent of the redshift of the 
epoch re-ionization, z r . The most plausible parameter values 
are near the true value of fi, my = 0.05. As we move towards 
the faint end of the luminosity function the model parameter 
z r becomes progressively more constrained and the total num- 
ber of satellite galaxies becomes less dependent on /bary- For 
M v = -3.5 (top right-most panel). The corresponding model 
emulator strongly constrains the input parameter space around 
values of z r ~ 10, but it gives equally good fits for nearly all 
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Fig. 5. — The yellow/red solid lines show the satellite galaxy luminosity functions (left panel) and satellite galaxy luminosity-metallicity relations (right panel) 
extracted from a set of 200 models used to train our suite of model emulators. For clarity, only the result of a linear fit to each luminosity-metallicity relation is 
shown. Note the great diversity of outputs that can be obtained by varying the different input parameters. The models were obtained after coupling ChemTreeN 
with the W-body simulation MW1. The black solid line on both panels indicate the model considered to be the galaxy's "true" observational quantities, obtained 
after running ChemTreeN with the input parameter vector x ^ s . The vertical dashed lines on the left panel indicate the five values of M v chosen to sample the 
LFs. We sample each L-Z relation by extracting slopes and intercepts from the corresponding linear fits. A different model emulator was constructed for each of 
these seven model outputs. 
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FIG. 6. — Three different sections of each Implausibility surfaces, I(x), obtained from the five model emulators constructed for the LF's outputs. The output 
being emulated is indicated on the top right corner of each panel. Note that columns correspond to different observables. The 3D implausibility surfaces are sliced 
with three orthogonal planes as defined by the components of x a ^ s . The different colors show different values of I(x) in logarithmic scale. Low implausibility 
values are indicated in light colours. Note that, given an input parameter vector xt, the larger the value of the I(xt), the less likely a good fit to the observable 
data can be obtained. The top, middle and bottom row panels show the / esc = 50, /bary = 0-05 ar| d & = 10 sections of the I(x) surfaces, respectively. The black 
dashed lines indicate the values of the remaining two components of x ^ s . Model emulators are compared to the mock observable data obtained after running 
ChemTreeN with the input parameter vector x a b s . Both mock observables and training data set are obtained by coupling ChemTreeN with the W-body simulation 
MW1. From these model emulators it is possible to constrain strongly the parameters /i,aiy and z r , but not / eS c. 
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FIG. 7. — As in Figure [6]for the two model emulators constructed for the 
L-Z relation. The left ana right panels show sections of the implausibility 
surfaces associated with the slope and the intercept, respectively. Note that 
these model emulators can provide strong constrains to the model parameter 
/esc- 

possible values of /bary. The second row of panels show sec- 
tions of the I(x) surfaces at /b ary = 0.05. The satellite galaxy 
luminosity function appears to be completely independent of 
the value adopted for the escape factor of metals, / e sc- This 
reflects the lack of a metallicity dependence on the baryon 
budget and star formation rate. At the bright end of the lu- 
minosity function, any combination of z r and / esc would yield 
an equally good fit to the mock observable data. However at 
the faint end values of z r rj 10 are required to fit the mock 
data. A similar result can be obtained for the third row of 
panels showing the remaining sections, i.e., z r = 10. Again, a 
good fit to the "observable" data can be obtained with values 
°f Aary ~ 0.05 for any possible value of / esc . 

It is possible to put constraints on the parameter / esc by 
exploring cross-sections of the implausibility surfaces con- 
structed for the satellite galaxy luminosity-metallicity rela- 
tion's slope and intercept model emulators. The middle and 
bottom panels of Figure [Tjshow the sections defined by /b ary = 
0.05 and Zt = 10 , respectively. Comparison with Figure [6]re- 
veals implausibility surfaces with a more complex topology. 
Although both emulators present regions of low implausibil- 
ity for a wide range of / esc values, these regions are strongly 
associated with z r and /bary These two parameters are also 
strongly constrained by the slope and intercept of the L-Z re- 
lation, as shown in the top row panels. 

Individually, none of the previously explored implausibil- 




150 20 



FIG. 8. — Iso-implausibility surfaces extracted from the joint implausibility 
measure J(x). Redder colors indicate larger values of J(x). As the value 
of J(x) decreases the volume enclosed by each iso-surface becomes smaller, 
converging towards the values associated with tCobs, as shown by the red solid 
lines. The region of lowest implausibility (and thus highest plausibility) is 
shown by the opaque blue volume at the intersection of the red lines. 

ity surfaces can constrain the full parameter space. This is 
not the case with the joint implausibility measure J(x), which 
combines the information obtained from the seven model em- 
ulators into one quantity 3 . Figure [8] shows different iso- 
implausibility surfaces of the resulting J(x). Notice that as 
the value of J(x) decreases the volume enclosed by each iso- 
surface becomes smaller, converging towards the values as- 
sociated with a; bs, as shown by the red solid lines. This can 
be more clearly appreciated in Figure [9] Each row of panels 
shows different sections J(x) as we traverse one of the three 
possible dimensions. The black solid line on the color bars 
show the cutoff applied to the joint implausibility. A value of 
J(x) > 3 (in t = 7 dimensions) indicates that it is implausible 
to obtain a good fit to the observed data with the correspond- 
ing values of the model parameters. Thus, regions of the pa- 
rameter space lying above this threshold can be disregarded. 
We find that J(x) strongly constrains the full parameter space 
under study. Furthermore, the values of the components of 
a; D bs are located in the most plausible regions of the space, as 
indicated by the star symbols in the corresponding panels. 

As described by B10, it is possible to further constrain the 
input parameters by performing an iterative approach. The 
idea consists in isolating the region below the chosen implau- 
sibility threshold and generating a new training set using de- 
sign points located within these regions. The kind of "con- 
trolled" experiments performed here, in addition to the low di- 
mensionality of our previous example, renders this approach 
unnecessary. Thus, we refer the reader to B10 for thorough 
description of this procedure. 

5.2. Impact of merger histories 

In the previous subsection we showed that it is possible to 
recover the set of input parameter chosen to create a mock 
Milky Way-like observational data set using a suite of model 
emulators. In this "controlled" experiment, both training and 
mock observable data were obtained by coupling ChemTreeN 
to a merger tree obtained from a single simulation. Therefore, 

3 Recall that, when dealing with more than one observable simultaneously, 
model emulators are constructed in the principal component space of the 
training set. 
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FIG. 9. — Different sections of the Joint implausibility surface, J(x), obtained by combining information provided by the seven model emulators shown in 
Figures|6]and^] The different colors show different values of J(x) in logarithmic scale. Model emulators are compared to the mock observable data obtained 
after running CnemTreeN with the input parameter vector x a \, s . Both mock observables and training data set are obtained by coupling ChemTreeN with the 
/V-body simulation MW1. The top, middle and bottom row panels show different sections of constant /esc> /bary and Zr, respectively. On each row, the black 
dashed lines indicate the values of two of the components of £c |j S . If the three components are simultaneously present in a section, the location of a3 t, s is indicated 
with a blue star. The horizontal black solid line on the color bars indicate the imposed threshold: a value above this threshold shows that it is very implausible 
to obtain a good fit to the observed data with the corresponding values of the model parameters. Note that J(x) can strongly constrain the full parameter space 
under study. Note as well that the values of the components of x b s are located in the most plausible regions of the space. 



we have implicitly assumed that the exact merger history of 
our Milky Way-like galaxy is a known variable of the prob- 
lem. In reality, this merger history is poorly known, and could 
be regarded as an extra input parameter of the model. It is thus 
important to study how different merger histories can compro- 
mise our ability to meaningfully constrain the input parameter 
space. To this end, we perform the following set of controlled 
experiments. Using the merger trees extracted from the four 
available A^-body simulations we generate four different train- 
ing sets (each training set containing n = 200 design points) 
and construct, for each set, the suite of model emulators dis- 
cussed previously. Hereafter, we will refer to these emulators' 
sets as "MW/-emulators", with i= 1, 2, 3 and 4. The input pa- 
rameter vector x bs = (z r , / eS c, /bary) = (10,50,0.05) is used 
to obtain a mock observational data set from each merger 
tree. We will refer to these mock observables as "MW7- 
observables". We then ask the following question: 

Is it possible to recover the input parameter vec- 
tor, a; b s , if we use training data obtained from a 
merger tree different than that used to generate 
the mock observables? 



In Figure 10 we show the outcome of this experiment. Each 
block of four panels shows sections of the joint implausibil- 
ity surface obtained after comparing given MW/-observables 



with the four MW/-emulators. The merger tree used to gener- 
ate the MW/-observables in each block is indicated with the 
green label and marked "OBS". For example, the top left cor- 
ner shows comparison results using MWl-observables. As 
in Figure [9] when the model emulators are trained on the 
same merger tree used to generate the mock observables, we 
can successfully constrain the input parameter space and re- 
cover the components of a; Q b s . However, when model em- 
ulators constructed on different merger trees are considered, 
the most plausible regions are located around values of /b ary 
much larger than those used to obtain the mock observables. 
This is not surprising since, as shown in Figure [3] MW1 is 
the Milky Way-like halo that contains the largest number of 
satellites at all M v . To achieve a good fit to MWl-observables 
in the remaining simulations requires a larger amount of gas 
to form stars. Note as well that the joint implausibility sur- 
face obtained with the MW3-emulators never falls signifi- 
cantly below the chosen threshold, so MWl-observables can- 
not be reproduced using the merger history extracted from 
halo MW3. Another interesting example is shown on the 
lower right panels of Figure [TUJ where we consider MW4- 
observables. Very good fits to these observables can be ob- 
tained for either larger (MW3-emulators) or smaller (MW2- 
emulators) values of /bary than that used to generate the mock 
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FIG. 10. — Sections of Joint Implausibility surfaces' at constant / CS c, obtained after comparing different models and mock observables. The different colors 
show different values of J(x) in logarithmic scale. Each block of panel shows the results of comparing a given MWi-observables to the four sets of MWt- 
emulators (see text), where k, i= 1, 2, 3 and 4. MWi-observables are obtained by running ChemTreeN on the merger tree extracted from simulation MWi, 
using the input parameter vector £E b s . The labels on the top left corner of each panel indicates the MW&-emulators being consider. In green we indicate the 
MWi-observables associated with each block. The section were selected such that the three components of £c b s are present. On each panel, the white dashed 
lines indicate the values of two of the components. The horizontal black line on the color bars indicate the imposed threshold: a value above this threshold shows 
that it is very implausible to obtain a good fit to the observed data with the corresponding values of the model parameters. Note that, in many cases, similarly good 
fits to a given set of observables can be obtained with different parameter's values simply by modifying the host's merger history. These values may significantly 
differ from those used to generate the mock observables. 



observables. A similar situation is observed for the input pa- 
rameter Zf Note that we have only considered the / esc = 50 
section of each joint implausibility surface. As described in 
Section |4] the satellite galaxy luminosity-metallicity relation 
is not sensitive to the merger history of the host halo. The 
parameter / esc is therefore well constrained in all cases. This 
forces the J{x) surfaces to have the most plausible regions in 
the vicinity of the aforementioned section of J(x). 

The previous analysis clearly shows how a particular 
merger history can influence the model parameter selection: 
similarly good fits to a given set of observables can be ob- 
tained with different model parameter values simply by modi- 
fying the host's merger history. In our experiments these val- 
ues may differ from those used to generate the mock observ- 
ables. When comparing with real observational data, a given 
set of best fitting parameter's values may be significantly off 



from the values that could best parametrize the desired un- 
derlying physical processes. This in turn may have important 
implications on other observable quantities that we would like 
to study and which have not been used for model parameter 
selection. 

5.3. Model comparison 

The previous analysis illustrated the importance that a 
host's merger history has on model parameter estimation. For 
a given set of mock observables, locations of low-joint im- 
plausibility regions may significantly vary simply by chang- 
ing the host merger history. However, it remains to be ex- 
plored whether differences in the resulting joint implausibility 
surfaces obtained with, e.g., MW4-observables (bottom right 



panels of Figure 10 1 could allow us to isolate a "most likely" 
merger history for this mock data. Is it possible to constrain 
the host merger history by statistically comparing these sur- 
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faces? 

This question can be addressed within a Bayesian frame- 
work for model quality assessment. The posterior probability 
for a model Mj, given an observable data set Yf, is given by 
the Bayes' theorem (Gregory 2005 ), 



P(Mi | Yf) : 



P(Y f | Md P(Mj) 
P(Yf) 



(8) 



where the denominator P(Y f ) = £\ j P(Y f \ Mj)P(Mj) is the 
marginal probability density of Yp, and P(Yf \ M,) is the prob- 
ability density of Yp under model M,, marginalized over the 
model parameters, x, representing the probability density for 
the observable Yf under the assumption of model M,. In this 
analysis, a model M, represents the coupling of ChemTreeN 
with a given cosmological simulation. In model selection 
problems it is often more useful to consider the ratio of the 
probability of two models rather than their probabilities di- 
rectly. The quantity 



O; 



P{Mj I Yf) = P(Y f | Mi) P{Mj) 
P(Mj | Y f ) P(Y f | Mj) P(Mj) 



(9) 



is known as the odds ratio in favor of model M, over model 
Mj. The first factor on the right side of |9]) is known as the 
Bayes factor, 



B, 



P(Y f | Mi) _ JP(x | Mj) P(Y f | x,M t ) dx 
P(Y f | Mj) ~ JP(x\Mj)P(Y f | x,Mj) dx' 



(10) 



If we assign equal prior probabilities for all models, then odds 
ratios and Bayes factors coincide: 

O u =Bij 

In our problem we approximate B,j by calculating the like- 
lihoods, C = P(Yf | x,Mj), under a multivariate normal ap- 
proximation from the corresponding joint implausibility, i.e., 
C cx f e~ y <( x) I 2 dx, with uniform priors P(x \ Mj). Here, Jj(x) 
is obtained using MW/-emulators. 

While the use of Bayes factors to compare and select among 
models is well established, a variety of scales have be en pro- 
posed to fac ilitate their quantitative interpretation (Fadely & 
|Keeton|2012| l. We will employ Jeffreys' scale (Jeffreys 1998 ), 
presented in Table [3] According to this scale, values of 
\og l0 Bjj between and 0.5 indicate that models M, and .My are 
equally supported by the observable data set Yf. Conversely, 
a value larger than 2 gives a decisive evidence for model M ; 
against model Mj, given Yf. 

In this work we are dealing not only with four different 
models, but also with four different mock observable data 
sets. From now on, we will refer as By to the Bayes factor ob- 
tained comparing models M, and Mj using MW^-observables. 
The results of this analysis are shown in Figure 1 1 in the form 
of a matrix. The elements of this matrix show, coded by color, 
the values of the different Bayes factors. Note that the matrix 
is divided into four blocks, each one of them associated with 
a different MWfc-observable. For comparison, the panels are 
distributed as in FigureflO] For example, the first block on the 
top left corner shows theprobability ratios of model M\ and 
Mj, with j = 1,2,3 and 4, using MWl-observables. The first 
element of this block is, not surprisingly, B\ l = as we are 
comparing model Mi with itself. It is interesting however to 
observe that, on the basis of MWl-observables, model Mi and 
M2 are equally plausible, as indicated by B\ 2 . Comparison 



TABLE 3 

Jeffreys' scale for grading the significance 
associated with different ranges of the 
logarithmic bayes factor. 



log 10 B,j 


Significance 


0-0.5 


Barely worth mentioning 


0.5-1 


Substantial 


1-1.5 


Strong 


1.5-2 


Very strong 


> 2 


Decisive 



with Figure 10 (top left panels) shows that the combination 
of model M2 and MWl-observables yield plausible values of 
/bary that are much larger than that used to generate the mock 
observables (i.e., /bary = 0.05). A similar situation arises when 
considering MW4-observables. In this case, models M2, M3 
and M4 are all equally supported by the mock observable data. 
However, the corresponding joint implausibility surface's sec- 
tions, shown on the bottom left panels' block of Figure [TUj 
suggest that the most plausible values of the input parame- 
ter for models M2 and M3 differ significantly from the "true" 
values used to generate the MW4-observables. 

This analysis has shown us that multiple models with differ- 
ent merger tree histories can be equally supported by a given 
observational data set. The "best fitting" parameters extracted 
from these models are, however, significantly different from 
each other. On the basis of this analysis it is therefore not pos- 
sible to assess what set of parameters would be best suited to 
model the formation of the stellar halo of the Milky Way. 
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FIG. 1 1 . — Values of the log 10 Bayes factors obtained by comparing two dif- 
ferent models, after fixing a given mock observable data set. The values are 
presented in logarithmic scale. Each element B* shows the result of compar- 
ing models M, and Mj using MWl-observables. The matrix is divided in four 
block, each of them associated with a differe nt M Wfc-observable. For com- 
parison, the panels are distributed as in Figure flO] The values of the different 
Bayes factor are color coded according to the Jeffery's scale, introduced in 
Table 3l Each element log 10 BJ ( = 0, since model M, is being compared to 
itself. Tvlore than one model can be equally-well supported by a given mock 
observable data, as shown by the multiple white and light-grey elements of 
each matrix's blocks. Note that the "best fitting" parameters e xtra cted from 
each model may differ significantly across data sets (see Figure [To). 
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In this work we combined modern statistical methods 
with cosmological simulations of the formation of the satel- 
lite galaxy population and stellar halo of Milky Way-like 
galaxies to explore how one might meaningfully constrain 
a galaxy's formation history using observables related to 
these components. The semi-analytic galaxy formation model 
ChemTreeN, coupled to merger trees extracted from four cos- 
mological TV-body simulations, is used to track the evolution 
of the baryonic properties of Milky Way-like galaxies and 
their satellite populations. Gaussian Process model emulators 
are used to emulate the outputs of ChemTreeN at any location 
of its p-dimensional input parameter space. This allow us to 
explore the full input parameter space orders of magnitude 
faster than could be done with the ChemTreeN code itself. To 
compare observational data to the model using these emula- 
tors we introduce the joint implausibility measure (see also 
BIO). We show that, using this quantity as a measurement of 
correctness, we can successfully recover the input parameter 
vector used to generate mock observational data sets (namely, 
the satellite galaxy luminosity function of our chosen Milky 
Way-like dark matter halos and the slope and normalization 
of the satellite galaxy luminosity-metallicity relation). 

From our analysis we draw the three following main con- 
clusions: 

1. Using our Gaussian Process model emulator, we have 
explored the dependence of a variety of observational 
quantities on model parameters. In some cases, this 
highlights straightforward relationships between obser- 
vational quantities and model parameters: for example, 
the number of low-luminosity dwarf galaxies depends 
very sensitively on the redshift of re-ionization, but is 
insensitive to other model parameters. On the other 
hand, the number of high-luminosity dwarfs is insen- 
sitive to the redshift of re-ionization, but sensitive to 
the reservoir of gas available for star formation. For 
simplicity, throughout this work we have focused our 
analysis on a three-dimensional input parameter space. 
However, as shown in Appendix[B]and BIO, the method 
can be extended in a straightforward manner to much a 
higher-dimensional parameter space. Thus, techniques 
like the one presented here are well suited not only to 
explore the input parameters of complex models but 
also to expose and visualize the non-linear coupling be- 
tween the parametrization of the different physical pro- 
cesses being considered. 

2. The detailed merger history of a galaxy can very sen- 
sitively affect the outcome of the procedure for find- 
ing the "best fitting" parameters to a given mock obser- 
vational data set. Similarly good fits can be obtained 
with different parameter values simply by modifying 
the host galaxy's merger history. More importantly, in 
our experiments these values may strongly differ from 
those used to generate the mock observables. This is 
true even when the input TV-body simulations have been 
carefully screened to all have Milky Way-like masses 
and times of last major mergers. When comparing with 
real observational data, the resulting best fitting param- 
eter values may differ significantly from the values that 
could best parametrize the desired underlying physical 
processes. This in turn may have important implica- 



tions on other observable quantities that we would like 
to study and which have not been used for model pa- 
rameter selection. 
3. Using a Bayesian framework for model quality assess- 
ment, we have explored whether it is possible to iden- 
tify (or at least constrain) the merger history of the 
galaxy from which observables have been obtained. 
This exercise showed that multiple models with differ- 
ent merger histories can be equally supported by a given 
observational data set. The "best fitting" parameters ex- 
tracted from these models do, however, differ signifi- 
cantly from each other. On the basis of this analysis it 
is therefore not possible to assess what set of parame- 
ters would be best suited to model the formation of the 
Milky Way's satellite galaxy population and its stellar 
halo. 

The results presented in this work highlight the dangers of 
selecting model parameters based on simulations and obser- 
vations of individual objects such as the Milky Way. A way 
to partially circumvent this problem is to fix the values of 
a subset of parameters such that it is possible to reproduce 
properties of the galaxy population in the local Universe as 
well as at higher redshift. This approach has been success- 
fully adopted by several authors to study the formation of the 
Milky Way S tellar halo and characterize its satellite popula- 
tion (see, e. g. De Lucia & Helmi 2008| Li et al.|2010 |Cooper 
et al.pOlO Font et alTI 201 lb| l. It is however not surprising 



that important physical process involved in the evolu tion of, 
e.g., the faintest Milky Way dwarf galaxies (see e.g., | W illman 
[etatl[2005) |Belokurov et"aT1[2Ci09l [20071 |Walsh et al. 12007) 
are poorly constrained by the properties of the much larger- 
scale galaxy population. Statistical methods such as the one 
presented here are very well suited to visualize the interplay 
between different physical mechanism and thus can be used 
to further constrain the parameters that are more relevant to 
the evolution of dwarf galaxies. 

We have seen that different models of the Milky Way stellar 
halo, each with a different formation history, can be equally 
supported by a given observational data set even though the 
values extracted for the best fitting parameter may signifi- 
cantly differ. This seems to compromise the possibility of 
constraining the formation history of the Milky Way through 
this kind of analysis. Note, however, that our results are based 
on merger trees extracted from only four cosmological simu- 
lations. Therefore, it remains to be explored whether statisti- 
cal constraints on the MW's merger history could be obtained 
by analyzing a large set of high resolution cosmological simu- 
lations densely sampling full range of possible formation his- 
tories. We defer this analysis to a later paper. 
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APPENDIX 



A. MULTIVARITE EMULATOR FORMALISM 

Here we outline the details of constructing an emulator for models with multidimensional output y = {y\, . . . ,y t }. In theory 
each component could be considered independent of the others and emulated separately, as we have shown above by examining 
the implausibilities I{x) this does not usually give satisfactory results. Principal Components Analysis (PCA) is used to generate 
an orthogonal decomposition of the output vector y. The resulting basis is orthogonal and approximately independent. We retain 
the r eigen-functions forming a subset of the full decomposition responsible for 95% of the observed variation. The set of training 
observations Y = {y\ , . . . , y n } is then projected into this new orthogonal basis and independent emulators are generated for the 
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weights of the r basis functions. Given our set of n observations Y = {y\ , . . . , y„} we suppose that y-, ~ MVN(/i, E) where 

1 " 

1=1 
i " 

s « s = - J2 [(» - A/) r (y« - A/)] , (A2) 
i=i 

are the sample mean vector (of length t) and sample covariance matrix (f x f) respectively. An eigen-decomposition of the sample 
covariance matrix E 

E = UAU f , (A3) 

determines the PCA decomposition. The columns of the matrix U are the eigenvectors of E, while A is diagonal with the 
corresponding eigenvalues in decreasing order. The eigenvectors define the new orthogonal basis and the eigenvalues determine 
the weights of each basis function. 

The sum of the eigenvalues or the trace of A is also the trace of S, the sum of the sample variances of Y, so each eigenvalue 
represents the variance contribution of its associated eigenfunction to the observed total. The first eigenvector gives the direction 
in which the data Y are most variable, and the remaining eigenvectors correspond to orthogonal directions with succesively 
smaller amounts of variation. Typically a small number r -C t of the largest eigenvalues corresponding to the most variable 
combinations of the observables y are sufficient to describe nearly all of the variation of the full data set. PCA can be used for 
dimension reduction by keeping only the r most influential components, and regarding the remaining t-r dimensions as noise 
and discarding them. 

E = UAU' « U r A,.U^. (A4) 

The variability fraction V(r) = Xw=i ^//Tr(A) accounted for by the first r components can be used to help select of r; in this 
analysis we chose the smallest r for which V(r) > 0.95. Typically with t = 7 we could attain this with r = 5. 

After carrying out the eigen-decomposition and selecting an appropriate r, we project our training set Y' into the associated 
PCA basis as 

Z=4=U^(Y-A), (A5) 

V A r 

an r x n matrix of the projections of the training data projected into the r independent and orthogonal components of the PCA 
decomposition. Note that Z is standardized to have mean zero and unit covariance. Now we can construct r independent 
emulators, each one trained on a component of Z. 

Each orthogonal emulator is constructed as outlined in Section [3] giving a posterior mean m z (x) and variance E z (ce) Q. 
The mean and variance are functions of the location in the design space x it is important to remember that they are functions 
which give output in the PC space. To obtain predictions for the model outputs Y we need to undo the PCA decomposition by 
reprojecting back into the natural observable space. Naturally by selecting r < t we have lost some of the original information 
however with judicious choice of r this is usually not a serious issue. 

The projected mean, a vector of length t , m(x) in the Y space is given as 

m(x) sa jjL+X5 r k l J 2 m z {x), (A6) 

where m z (x) is the length r vector of emulator means in the PC space. We label the emulated covariance of the Z'th and j'th 
model observables at the location x, as Kij(xi), which is 

r 

K Ij (x i ) = Cow[y,(x i ),yj(x i )]~ fZ/aA^f/^A^VartZ^^,)], (A7) 

where Var[Zp(xi)] is the emulated variance of the /?'th PC dimension at the location a;, in the parameter space. This gives a 
useful estimate of the covariance between our y observables at as yet untried input locations and the t x t matrix K is crucial for 
the joint implausibility J(x). 

B. FIVE-DIMENSIONAL PARAMETER SPACE EXPLORATION 



As discussed in Section 5.1 model emulators can be easily generalized to deal with input parameter spaces of much larger 
dimensionality than previously considered. In this appendix we demonstrated this by training a suite of model emulators in a 
five-dimensional input parameter space. The dimensionality is increased by including in the analysis the star formation efficiency, 
e», and the Type II SNe yield, mp e . The training data set is constructed by running ChemTreeN on a design V = {a? l7 . . . ,x n }, 
containing a number n = 240 design points. Note the larger number of design points with respect to previous examples, intended 
to better sample the larger volume of this input parameter space. A mock observational data set is generated by considering the 
input parameter vector a; Q b s = (z r , / esc , /bary, e*, Wp e ) = (10, 50, 0.05, 10~ 10 , 0.07). Both, mock observables and training data 
set are obtained by coupling ChemTreeN with the merger tree extracted from the dark matter-only A^-body simulation MW 1 . The 



results are shown on Figure 12 Each panel shows a different section of the resulting Joint implausibility surface. These 2D 



sections are obtained after fixing the remaining three parameters to the values associated with £c b s . The black solid line on the 



18 



F. A. Gomez & C.E. Coleman-Smith et al. 



color bars show the threshold applied to the joint implausibility. Values above this threshold indicate that it is very implausible 
to obtain a good fit to the mock observational data using the corresponding values of the model parameters. As observed in the 
three-dimensional example shown in Figure [9] J(x b s ) can strongly constrain the full parameter space under study. Even in this 
higher-dimensional space, the values of the components of :E obs are always located in the most plausible regions of the space, as 
indicated by the black dashed lines. 



x1(T 10 X10™ x10~'° 
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FIG. 12. — Different sections of the Joint implausibility surface, J(x), obtained after training model emulators in a five dimensional input parameter space. 
The mock observable data is generated by feeding ChemTreeN with the input parameter vector x^ — (Zr, /esc, /bary, 6*, M^) = (10, 50, 0.05, 10" 10 , 0.07). 
Each 2D section is obtained after fixing three out of the five parameters to the values associated with the component of x ^ s . The black dashed lines indicate the 
values of the remaining two components. The different colors show different values of J(x) in logarithmic scale. The horizontal black solid line on the color 
bars indicate the imposed threshold: a value above this threshold shows that it is very implausible to obtain a good fit to the observed data with the corresponding 
values of the model parameters. Note that J(x) can strongly constrain the five dimensional input parameter space under study. Note as well that the values of the 
components of x ^ s are always located in the most plausible regions of the space. 



